feat: add partitioned namespace by wojiaodoubao · Pull Request #5896 · lance-format/lance

wojiaodoubao · 2026-02-05T13:40:30Z

This is a sub-task of the partitioned namespace

java/lance-jni/Cargo.toml

jackye1995 · 2026-02-06T05:03:17Z

rust/lance-namespace-impls/src/dir/manifest_ext.rs

+
+/// Request for creating multiple namespaces with a single merge insert.
+#[derive(Debug, Clone)]
+pub struct CreateMultiNamespacesRequest {


why do we need this? I thought we only need creating multiple tables in a namespace

oh I see nvm because we want to create the namespaces that represent partition values.

I think we should add these as a part of the Lance Namespace operations, introduce BatchCreateNamespaces, BatchCreateTables, etc., those operations can be useful anyway even outside the context of partitioned namespace.

And then you don't need a dedicated extension just for manifest namespace to make it work.

I think we should add these as a part of the Lance Namespace operations, introduce BatchCreateNamespaces, BatchCreateTables, etc.

Agree, we can do this.

jackye1995 · 2026-02-06T05:13:18Z

rust/lance-namespace-impls/src/dir/manifest_ext.rs

+}
+
+#[derive(Debug, Default, Clone)]
+pub struct CreateMultiNamespacesRequestBuilder {


just to create a separated thread. I am starting to think, is there real benefit in creating the sub-namespace structures? It seems purely for the purpose that it is cool to list namespaces in this way, but it does not serve any practical purposes since all the pruning are done directly against the table's partition column values in __manifest. Would it make more sense to just not have those nested namespace structures?

Yes, having a table is sufficient for partition creation and pruning. The reason for retaining the namespace is that PartitionedNamespace is a type of DirectoryNamespace that follows to the partition spec standard. If we only keep the table part, PartitionedNamespace can no longer be treated as a normal DirectoryNamespace.

I think from a consistency perspective, it would be better to retain it. Shall we retain it, or remove it for simplicity?

wojiaodoubao · 2026-02-25T15:21:16Z

I just finished the first version of partitioned namespace and it is ready for review now. When I implemented it, I found the original design need some update, here is the reason.

Explanation of the New Design

Compared to the previous partitioning spec, the current implementation introduces two key changes. Below I’ll explain the motivation behind these adjustments and would love to get further feedback and discussion.

1. Introducing TableSpec

In the earlier partition spec design, all tables shared the same schema, which was stored in the table metadata. A PartitionedNamespace contained multiple sub-namespaces named v{i}, representing different versions of the partition spec.

The problem with that design is that it makes schema evolution difficult. We cannot atomically modify the schema of all tables without breaking the semantic guarantee that “all tables share the same schema.”

Even if we support multi-partition transactions in the future, we would still be blocked at the step of updating the schema stored in table metadata, because updating table metadata itself does not have transactional semantics.

To solve this problem, I propose introducing a new abstraction: TableSpec, which encapsulates both the schema and the partition spec.

It is defined as:

pub struct TableSpec {
    id: i32,
    schema: ArrowSchema,
    partition_spec: PartitionSpec,
}

TableSpec is stored as a first-level sub-namespace under the PartitionedNamespace root. The v{i} naming convention corresponds to the id, while the schema and partition_spec are stored in the namespace properties.

Whenever a schema evolution or partition spec evolution occurs, we create a new sub-namespace. This operation is purely metadata-level. In manifest namespace, creating a namespace is atomic.

New data must follow the new schema (i.e., tables are created and written under the new TableSpec namespace).

Existing data does not need to be modified and continues to use the previous schema and partition spec.

Another advantage of this design is that it makes it relatively straightforward to support branching functionality in the future.

The overall structure can be visualized as follows:

2. Using Deterministic Names for Namespaces

In the previous partition spec design, the name of a partition namespace was a 16-character base36 string. Any type of partition value would be mapped to such a 16-character base36 string.

The issue with this design is that it makes it difficult to resolve concurrency conflicts. In distributed scenarios, we may need to concurrently create tables and namespaces with the same partition values. Since partition values effectively serve as business primary keys, this becomes a consistency challenge.

Originally, I considered leveraging Lance’s merge-insert deduplication capability to enforce uniqueness at the business key level. However, merge-insert in Lance requires that:

A column be explicitly defined as a primary key
The primary key column must be non-null

In our current design, the __manifest partition field column must be nullable. Additionally, partition fields can evolve over time, meaning we cannot reliably define a fixed initial primary key column. This makes it infeasible to use merge-insert deduplication directly on partition fields.

To address this, I propose using deterministic names for partition namespaces. At each level, the namespace name is derived from the serialized string representation of the partition value.

With this approach:

Namespace identity becomes deterministic and directly tied to partition values.
We only need to define a primary key on the object_id column.
We can then leverage merge-insert to resolve concurrency conflicts when creating new namespaces and tables.

This simplifies concurrency handling while preserving correctness in distributed environments.

codecov · 2026-02-25T16:47:28Z

Codecov Report

❌ Patch coverage is 66.99029% with 136 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-namespace-impls/src/udf.rs	45.97%	44 Missing and 3 partials ⚠️
rust/lance-namespace-impls/src/util.rs	80.54%	37 Missing and 6 partials ⚠️
rust/lance-arrow/src/schema.rs	57.77%	32 Missing and 6 partials ⚠️
rust/lance-namespace-impls/src/dir.rs	50.00%	4 Missing ⚠️
rust/lance-namespace/src/schema.rs	33.33%	0 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions bot added enhancement New feature or request python java labels Feb 5, 2026

wojiaodoubao mentioned this pull request Feb 5, 2026

Epic: partitioned namespace #5741

Open

4 tasks

wojiaodoubao commented Feb 5, 2026

View reviewed changes

java/lance-jni/Cargo.toml Outdated Show resolved Hide resolved

wojiaodoubao force-pushed the partitioned-namespace-new branch from 23d594c to b61565e Compare February 5, 2026 13:44

wojiaodoubao marked this pull request as draft February 5, 2026 14:49

wojiaodoubao force-pushed the partitioned-namespace-new branch from b61565e to dc95f18 Compare February 5, 2026 15:45

jackye1995 self-requested a review February 6, 2026 05:02

jackye1995 reviewed Feb 6, 2026

View reviewed changes

jja725 mentioned this pull request Feb 24, 2026

feat(connector): Presto Lance Connector prestodb/presto#27185

Open

7 tasks

wojiaodoubao force-pushed the partitioned-namespace-new branch from dc95f18 to fadadad Compare February 25, 2026 15:21

wojiaodoubao added 2 commits February 25, 2026 23:49

feat: add partitioned namespace

f8da5fc

refactor: move manifest_ext to manifest

7940d6b

wojiaodoubao force-pushed the partitioned-namespace-new branch from fadadad to 7940d6b Compare February 25, 2026 15:53

wojiaodoubao marked this pull request as ready for review February 26, 2026 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add partitioned namespace#5896

feat: add partitioned namespace#5896
wojiaodoubao wants to merge 2 commits intolance-format:mainfrom
wojiaodoubao:partitioned-namespace-new

wojiaodoubao commented Feb 5, 2026

Uh oh!

Uh oh!

jackye1995 Feb 6, 2026

Uh oh!

jackye1995 Feb 6, 2026

Uh oh!

jackye1995 Feb 6, 2026

Uh oh!

wojiaodoubao Feb 6, 2026

Uh oh!

jackye1995 Feb 6, 2026

Uh oh!

wojiaodoubao Feb 6, 2026

Uh oh!

wojiaodoubao commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wojiaodoubao commented Feb 5, 2026

Uh oh!

Uh oh!

jackye1995 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

jackye1995 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

jackye1995 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

wojiaodoubao Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

jackye1995 Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

wojiaodoubao Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

wojiaodoubao commented Feb 25, 2026

Explanation of the New Design

1. Introducing TableSpec

2. Using Deterministic Names for Namespaces

Uh oh!

codecov bot commented Feb 25, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants